TOPICAL REVIEW 



Cluster Variation Method in Statistical Physics and 
Probabilistic Graphical Models 

Alessandro Pelizzola 

Dipartimento di Fisica, Politecnico di Torino, c. Duca degli Abruzzi 24, 10129 
Torino, Italy and INFN, Sezione di Torino 

Abstract. 
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techniques for discrete (Ising-like) models in equilibrium statistical mechanics, 
improving on the mean-field approximation and the Bethc-Pcicrls approximation, 
which can be regarded as the lowest level of the CVM. In recent years it has been 
applied both in statistical physics and to inference and optimization problems 
formulated in terms of probabilistic graphical models. 
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similar techniques are discussed. The main properties of the method are 
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1. Introduction 

The Cluster Variation Method (CVM) was introduced by Kikuchi 1, in 1951, 
as an approximation technique for the equiHbrium statistical mechanics of lattice 
(Ising-like) models, generalizing the Bethe-Peierls 12131 and Kramers- Wannier 01 |S] 
approximations, an account of which can be found in several textbooks [HJITj- Apart 
from rederiving these methods, Kikuchi proposed a combinatorial derivation of what 
today we can call the cube (respectively triangle, tetrahedron) approximation of the 
CVM for the Ising model on the simple cubic (respectively triangular, face centered 
cubic) lattice. 

After the first proposal, many reformulations and applications, mainly to the 
computation of phase diagram of lattice models in statistical physics and material 
science, appeared, and have been reviewed in [HI- The main line of activity has 
dealt with homogeneous, translation-invariant lattice models with classical, discrete 
degrees of freedom, but several other directions have been followed, including for 
instance models with continuous degrees of freedom P|, free surfaces ^lE], models 
of polymers El and quantum models El- Out of equilibrium properties 
have also been studied, in the framework of the path probability method ^1 117L [T5| , 
which is the dynamical version of the CVM. Despite the CVM predicts mean-field 
like critical behaviour, the problem of extracting critical behaviour from sequences 
of CVM approximations has also been considered by means of different approaches 

[111201111112211231. 

A line of research which is particularly relevant to the present discussion has 
considered heterogeneous and random models. Much work has been devoted in the 
80's to applications of the CVM to models with quenched random interactions (see e.g. 
|24| and refs. therein), mainly aiming to the phase diagram, and related equilibrium 
properties, of Ising-like models of spin glasses in the average case. The most common 
approach was based on the distribution of the effective fields, and population dynamics 
algorithms were developed and studied for the corresponding integral equations. All 
this effort was however limited at the replica-symmetric level. Approaches taking 
into account the first step of replica symmetry breaking have been developed only 
recently j^, at the level of the Bethe-Peierls approximation, in its cavity method 
formulation, for models on random graphs in both the single instance and average 
case. These approaches have been particularly successful in their application to 
combinatorial optimization problems, like satisfiability |2()| and graph coloring |27j . 
Another interesting approach going in a similar direction has been proposed recently 
|28|. which relies on the analysis of the time evolution of message-passing algorithms 
for the Bethe-Peierls approximation. 

Prompted by the interest in optimization and, more generally, inference problems, 
a lot of work on the CVM has been done in recent years also by researchers working 
on probabilistic graphical models [221) since the relation between the Bethe-Peierls 
approximation and the belief propagation method (30j was recognized (SI) . The 
interaction between the two communities of researchers working on statistical physics 
and optimization and inference algorithms then led to the discovery of several new 
algorithms for the CVM variational problem, and to a deeper understanding of 
the method itself. There have been applications in the fields of image restoration 
|32l 1881 1841 [3S1 , computer vision [^ , interference in two-dimensional channels [^ , 
decoding of error-correcting codes J£,.40,, diagnosis unwrapping of phase 
images 021 ; bioinformatics 031 ^3 BH > language processing 021 03 . 
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The purpose of the present paper is to give a short account of recent advances 
on methodological aspects, and therefore applications will not be considered in detail. 
It is not meant to be exhaustive and the material included reflects in some way the 
interests of the author. The plan of the paper is as follows. In Section [3 the basic 
definitions for statistical mechanics and probabilistic graphical models are given, and 
notation is established. In Section|3|the CVM is introduced in its modern formulation, 
and in Section0]it is compared with related approximation techniques. Its properties 
are then discussed, with particular emphasis on exact results, in Sectional Finally, 
the use of the CVM as an approximation and the algorithms which can be used to 
solve the CVM variational problem are illustrated in Sectional Conclusions are drawn 
in Section [7| 

2. Statistical mechanical models and probabilistic graphical models 

We are interested in dealing with models with discrete degrees of freedom which will 
be denoted by s = {si, S2, ■ • ■ sa?}. For instance, variables Si could take values in the 
set {0, 1} (binary variables), {—1, +1} (Ising spins), or {1, 2, . . .q} (Potts variables). 

Statistical mechanical models are defined through an energy function, usually 
called Hamiltonian, H = H{s), and the corresponding probability distribution at 
thermal equilibrium is the Boltzmann distribution 

p(s)-^exp[-i7(s)], (1) 

where the inverse temperature (3 — {kBT)~^ has been absorbed into the Hamiltonian 
and 

Z = exp(-F)=^exphiJ(s)] (2) 

s 

is the partition function, with F the free energy. 

The Hamiltonian is typically a sum of terms, each involving a small number of 
variables. A useful representation is given by the factor graph AS.. A factor graph is 
a bipartite graph made of variable nodes i, j, . . ., one for each variable, and function 
nodes a,b, . . ., one for each term of the Hamiltonian. An edge joins a variable node i 
and a function node a if and only if i G a, that is the variable Si appears in Ha, the 
term of the Hamiltonian associated to a. The Hamiltonian can then be written as 

H = ^Ha{Sa), Sa = {si,iea}. (3) 

a 

A simple example of a factor graph is reported in Figure ^ and the corresponding 
Hamiltonian is written as 

-ff(si, S2, S3, S4, S5, se) = Ha{si,S2) + Hb{s2, S3, S4) + Hc{s3, S4, S5, Se). (4) 

The factor graph representation is particularly useful for models with non- 
pairwise interactions. If the Hamiltonian contains only 1-variable and 2-variable 
terms, as in the Ising model 

iJ = - ^ hiSt - ^ Ji]SiSj, (5) 

then it is customary to draw a simpler graph, where only variable nodes appear, 
and edges are drawn between pairs of interacting spins {i,j). In physical models the 
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Figure 1. An example of a factor graph: variable and function nodes are denoted 
by circles and squares, respectively 

interaction strength can depend on the distance between spins, and interaction is 
often restricted to nearest neighbours (NNs), which are denoted by {i,j). 

In combinatorial optimization problems, the Hamiltonian plays the role of a cost 
function, and one is interested in the low-temperature limit T —^ 0, where only 
minimal energy states (ground states) have a non-vanishing probability. 

Probabilistic graphical models |29[ are usually defined in a slightly different 
way. In the case of Markov random fields, also called Markov networks, the joint 
distribution over all variables is given by 

P{S) ^ ^llM^a), (6) 

a 

where ipa is called potential (potentials involving only one variable are often called 
evidences) and 

Z = J2IlMSa). (7) 

s a 

Of course, a statistical mechanical model described by the Hamiltonian Q corresponds 
to a probabilistic graphical models with potentials V'a = exp(— iJ^). On the other 
hand, Bayesian networks, which we will not consider here in detail, are defined in 
terms of directed graphs and conditional probabilities. It must be noted, however, 
that a Bayesian network can always be mapped onto a Markov network |29| . 

3. Fundamentals of the Cluster Variation Method 

The original proposal by Kikuchi pP was based on an approximation for the number 
of configurations of a lattice model with assigned local expectation values. The 
formalism was rather involved to deal with in the general case, and since then many 
reformulations came. A first important step was taken by Barker |5(Jj . who derived 
a computationally useful expression for the entropy approximation. This was then 
rewritten as a cumulant expansion by Morita |51l I52j . and Schlijper noticed that 
this expansion could have been written in terms of a Mobius inversion. A clear and 
simple formulation was then eventually set up by An , and this is the one we shall 
follow below. 




Cluster Variation Method 



5 



The CVM can be derived from the variational principle of equilibrium statistical 
mechanics, where the free energy is given by 

F = - InZ = min J^{p) = min V [p{s)H{s) + p{s) Inp(s)] (8) 

S 

subject to the normalization constraint 

It is easily verified that the minimum is obtained for the Boltzmann distribution 

p{s) ^ — exp[—H{s)] = argminJF (10) 
Z 

and that the variational free energy can be written in the form of a KuUback-Leibler 
distance 

^(p)=F + Ep(^)ln||^. (11) 

The basic idea underlying the CVM is to treat exactly the first term (energy) of 
the variational free energy J-{p) in Equation and to approximate the second one 
(entropy) by means of a truncated cumulant expansion. 

We first define a cluster a as a subset of the factor graph such that if a factor node 
belongs to a, then all the variable nodes i £ a also belong to a (while the converse 
needs not to be true, otherwise the only legitimate clusters would be the connected 
components of the factor graph). Given a cluster we can define its energy 

i?a(Sa) =E-ffa(Sa), (12) 

probability distribution 
and entropy 

Sa = -Ypa{Sa)'^T^Pa{Sa)- (14) 

Then the entropy cumulants are defined by 

/3Cq 

which can be solved with respect to the cumulants by means of a Mobius inversion, 
which yields 

qC/3 

where Ua denotes the number of variables in cluster a. The variational free energy 
can then be written as 

.F(p)=Ep(^)ff(^)-E50, (17) 

13 

where the second summation is over all possible clusters. 
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The above equation is still an exact one, and here the approximation enters. A set 
-R of clusters, made of maximal clusters and all their subclusters, is selected, and the 
cumulant expansion of the entropy is truncated retaining only terms corresponding 
to clusters in R. In order to treat the energy term exactly it is necessary that each 
function node is contained in at least one maximal cluster. One gets 

^S*/? ~ ^ S*^ = ^ Oa^a, (18) 

where the coefficients sometimes called Mobius numbers, satisfy 

^ a^^l y(3eR. (19) 

(3(ZaeR 

The above condition means that every subcluster must be counted exactly once in 
the entropy expansion and allows to rewrite also the energy term as a sum of cluster 
energies, yielding the approximate variational free energy 

J^i{Pa, a e R}) = ^ aaJ^aiPa), (20) 

where the cluster free energies are given by 

^a{Pa) ^'^[PaiSa)HaiSa) + Pa{Sa)'^^Pa{Sa)] ■ (21) 

The CVM then amounts to the minimization of the above variational free energy with 
respect to the cluster probability distributions, subject to the normalization 

Y,pM^I yaeR (22) 
and compatibility constraints 

P/5(«/3) = I]Pa(Sa) V/3 c a G i?. (23) 

It is of great importance to observe that the above constraint set is approximate, 
in the sense that there are sets of cluster probability distributions that satisfy these 
constraints and nevertheless cannot be obtained as marginals of a joint probability 
distribution. An explicit example will be given in Sectional 

The simplest example is the pair approximation for a model with pairwise 
interactions, like the Ising model Q. The maximal clusters are the pairs of interacting 
variables, and the other clusters appearing in R arc the variable nodes. The pairs have 
Mobius number 1, while for the variable nodes = 1 — d^, where di is the degree of 
node i, that is, in the factor graph representation, the number of function nodes it 
belongs to. 

The quality of the approximation (|18|l depends on the value of the neglected 
cumulants. In the applications to lattice systems it is typically assumed that, since 
cumulants are related to correlations, they vanish quickly for clusters larger than the 
correlation length of the model. In Figure [3 the first cumulants, relative to the site 
(single variable) entropy, are shown for the homogeneous ( Jy — J), zero field {hi = 0), 
square lattice Ising model, in the square approximation of the CVM. 

It can be seen that the cumulants peak at the (approximate) critical point and 
decrease as the cluster size increases. This property is not however completely general, 
it may depend on the interaction range. It has been shown 55 that this does not 
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Figure 2. Cumulants for the square lattice Ising model 



hold for finite instances of tfie Sfierrington-Kirkpatrick spin-glass model, which is a 
fully connected model. 

The meaning of cumulants as a measure of correlation can be easily understood 
by considering a pair of weakly correlated variables and writing their joint distribution 
as 



£ < 1. 



P12(S1, S2) = Pl(si)P2(s2) [1 + £(?(si, S2)] , 

The corresponding cumulant is then 

^12 = S12 -Si~S2^ -(In [1 + eq{si,S2)]) = 0(e). 



(24) 
(25) 



4. Region based free energy approximations 

The idea of region-based free energy approximations, put forward by Yedidia |56| , is 
quite useful to elucidate some of the characteristics of the method, and its relations 
to other techniques. A region-based free energy approximation is formally similar to 
the CVM, and can be defined through equations (^01) and |(2U, but the requirements 
on the coefficients are weaker. The single counting condition is imposed only on 
variable and function nodes, instead of all subclusters: 

Oq = 1 Va, (26) 
aa = l Vz. (27) 
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Figure 3. Factor graph of a model for which the Bethe-Peierls approximation is 
not a special case of the CVM 

Interesting particular cases are obtained if R contains only two types of regions, 
large regions and small regions. The junction graph method .St) } 1571 is obtained if they 
form a directed graph, with edges from large to small regions, such that: 

(i) every edge connects a large region with a small region which is a subset of the 
former; 

(ii) the subgraph of the regions containing a given node is a connected tree. 

On the other hand, the Bethe-Peierls approximation, in its most general formulation, 
is obtained by taking function nodes (with the associated variable nodes) as large 
regions and variable nodes as small regions. This reduces to the usual statistical 
physics formulation in the case of pairwise interactions. 

The CVM is a special region-based free energy approximation, with the property 
that R is closed under intersection. Indeed, one could define R for the CVM as the 
set made of the maximal clusters and all the clusters which can be obtained by taking 
all the possible intersections of (any number of) maximal clusters. 

It is easy to verify that the Bethe-Peierls approximation is a special case of CVM 
only if no function node shares more than one variable node with another function 
node. If this is not the case, one should be careful when applying the Bethe-Peierls 
approximation. Consider a model with the factor graph depicted in Figure 13 where 
s, = ±1 (i = 1,2,3,4), H = Ha + Hb and 

Haisi,S2,S3) = -hoSi - ^{S2 + S3) - -^"515253, (28) 

Hbis2, 53,54) = -hoS4 - ^{s2 + S3) - JS2S3S4. (29) 

The CVM, with function nodes as maximal clusters, is exact (notice that it 
coincides with the junction graph method), and the corresponding exact cumulant 
expansion for the entropy is 

S = Sa+Sb- S23, (30) 

while the Bethe-Peierls entropy is 

Sbp — Sa + Sb S2 ~ S3. (31) 

The two entropies differ by the cumulant S23 = S23 — S2 — S3, and hence 
correlations between variable nodes 2 and 3 cannot be captured by the Bethe- 
Peierls approximation. In Figure 0] it is clearly illustrated how the Bethe-Peierls 
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Figure 4. Entropy of the Bethe-Peierls approximation vs the exact one for a 
model for which the Bethe— Peierls approximation is not a special case of the CVM 



approximation can fail. At large enough J the exact entropy is larger (by roughly 
In 2) than the Bethc-Peierls one. 

5. Exactly solvable cases 

The CVM is known to be exact in several cases, due to the topology of the underlying 
graph, or to the special form of the Hamiltonian. In the present section we shall first 
consider cases in which the CVM is exact due to the graph topology, then proceed to 
the issue of realizability and consider cases where the form of the Hamiltonian makes 
an exact solution feasible with the CVM. 

5.1. Tree-like graphs 

It is well known that the CVM is exact for models defined on tree-like graphs. This 
statement can be made more precise by referring to the concept oi junction tree |58[l59j . 
which we shall actually use in its generalized form given by Yedidia, Freeman and Weiss 
[56;. A junction tree is a tree-like junction graph. The corresponding large regions 
are often called cliques, and the small regions separators. With reference to Figure 13 
it is easy to check that the CVM, as described in the previous section, corresponds to 
a junction tree with cliques (al23) and (6234) and separator (23), while the junction 
graph corresponding to the Bethe-Peierls approximation is not a tree. 

For a model defined on a junction tree the joint probability distribution factors 

EDI according to 



p{s) 



aeRL 



(32) 
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Figure 5. A one— dimensional strip and tlie clusters used to solve a pairwise 
model on it 



where Rl and Rs denote the sets of large and smaU regions, respectively, and dp is 
the degree of the small region (3 in the junction tree. Notice that no normalization is 
needed. 

The above factorization of the probability leads to the exact cumulant expansion 
S= J2 Sa- 1)^/3' (33) 

and therefore the CVM with R = Rl U Rs is exact. 

As a first example, consider a particular subset of the square lattice, the strip 
depicted in Figure |S1 with open boundary conditions in the horizontal direction, 
and define on it a model with pairwise interactions (we do not use the factor graph 
representation here). 

According to the junction tree property, the joint probability factors as follows: 

n P«(Sa) 

where I and II denote the sets of chains (except boundary ones) and ladders, 
respectively, shown in Figure El As a consequence, the cumulant expansion 

S^Y.^--Y.^P (35) 

qgii f3ei 

of the entropy is also exact, and the cluster variation method with i? = II U I is exact. 
For strip width = 1 we obtain the well-known statement that the Bethe-Peierls 
approximation is exact for a one-dimensional chain. Rigorous proofs of this statement 
have been given by Brascamp (61) and Percus ^62^ . More generally, Schlijper has shown 
that the equilibrium probability of a d-dimensional statistical mechanical model 
with finite range interactions is completely determined by its restrictions (marginals) 
to d — 1-dimensional slices of width at least equal to the interaction range. 



11 




In the infinite length limit L oo translational invariance is recovered 
^([pii(s,s')i?ii(s,s') +j9ii(s,s')lnpii(s,s')] -^]3i(s)lnpi(s) (36) 



L 

s,s' 



and solving for pn wc obtain the transfer matrix formalism: 

J = -\num^\Y,p\'\s)e^^[-Hn{s,s')]p\'\s') ) (37) 



Y.Pi{s) = l (38) 

s 

The natural iteration method (see section [6.3(1 in this case reduces to the power 
method for finding the largest eigenvalue of the transfer matrix. 

As a second example, consider a tree, like the one depicted in Figure and a 
model with pairwise interactions defined on it. 

In this case the probability factors according to 

i 

where {ij) denotes a pair of adjacent nodes. The cumulant expansion of the entropy 
is therefore 

and the pair approximation of the CVM (coinciding with Bethe-Peierls and junction 
graph) is exact. Recently this property has been exploited to study models on finite 
connectivity random graphs, which strictly speaking are not tree-like: loops are 
present, but in the thermodynamic limit their typical length scales like IniV (64j . 

As a final example, consider the so-called (square) cactus lattice (the interior of 
a Husimi tree), depicted in Figure [7| 

Here the probability factors according to 

WpuM 

Pis) = ^T—-^ (41) 
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Figure 7. A small portion of a square cactus lattice 



the entropy cumulant expansion takes the form 

□ 

and the CVM with R made of square plaquettes and sites is exact. Again, this 
coincides with the junction graph method and, if function nodes are associated to 
square plaquettes (so that the corresponding factor graph is tree-hke), with Bethe- 
Peierls. 

5.2. Realizability 

We have seen that when the probabihty factors in a suitable way, the CVM can be 
used to find an exact solution. By analogy, we could ask whether, as in the mean field 
approximation, CVM approximations can yield an estimate of the joint probability 
distribution as a function of the cluster distributions, in a factorized form. In the 
general case, the answer is negative. One cannot, using a trial factorized form like 

\[[pMT- (43) 

a 

(which would lead to a free energy like that in Eas. l2(Jl2T|l . obtain a joint probability 
distribution which marginalizes down to the cluster probability distributions used as 
a starting point. As a consequence, we have no guarantee that the CVM free energy is 
an upper bound to the exact free energy. Moreover, in sufficiently frustrated problems, 
the cluster probability distributions cannot even be regarded as marginals of a joint 
probability distribution |65| . 

It can be easily checked that Equation (|43|l is not, in the general case, a probability 
distribution. It is not normalized and therefore its marginals do not coincide with the 
Pa's used to build it. At best, one can show that 

nK(s«r° «exphi/(s)], (44) 

but the normalization constant is unknown. This has been proven in at the Bethe- 
Peierls level, and the proof can be easily generalized to any CVM approximation. 

Let us now focus on a very simple example. Consider three Ising variables, 
Si = ±1, z = 1, 2, 3, with the following node and pair probabilities: 

p,{si) = \/2 z = 1,2,3 (45) 
Pijis^, Sj) = i±|£l£l ^ -1 < c < 1, i<j. (46) 
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A joint p{si, S2, S3) marginalizing to the above probabilities exists for —1/3 < c < 1, 
which shows clearly that the constraint set Equation (|23|) is approximate, and in 
particular it can be too loose. For instance, in it has been shown that due to 
this problem the Bethe-Peierls approximation for the triangular Ising antiferromagnet 
predicts, at low temperature, unphysical results for the correlations and a negative 
entropy. 

Moreover, the joint probability p{si, S2, S3) is given by the CVM-like factorized 

form 

Pujsi, S2)P13{S1, S3)P23{S2, S3) ^^^^ 
P1{S1)P2{S2)P3{S3) 

only if c = 0, that is if the variables are completely uncorrelated. 

As a more interesting case, we shall consider in the next subsection the square 
lattice Ising model. In this case it has been shown |68[ I69| that requiring realizability 
yields an exactly solvable case. 



5.3. Disorder points 

For a homogeneous (translation-invariant) model defined on a square lattice, the 
square approximation of the CVM, that is the approximation obtained by taking the 
elementary square plaquettes as maximal clusters, entails the following approximate 
entropy expansion: 

□ (u> 

The corresponding factorization 

npn(*n) I[p^^('^' ^j) Up^i^^) (49) 

□ {^3) i 

for the probability does not, in general, give an approximation to the exact equilibrium 
distribution. Indeed, it does not marginalize to the cluster distributions and is not 
even normalized. 

One could, however, try to impose that the joint probability given by the above 
factorization marginalizes to the cluster distributions. It turns out that it is sufficient 
to impose such a condition on the probability distribution of a 3 x 3 square, like the 
one depicted in Figure |S1 It is easy to check that for an Ising model the CVM-like 
function 

P4(si, 52, S5,S4,)P4:{S2, S3, Sfj, 85)^4(54, ^5, ^8, 57)^4(55, Sg,Sq, Ss)pi{s5) 
P2(S2, S5)P2(S5, Ss)p2isi, S5)P2(s5, Sg) 

marginalizes to the square, pair and site distributions (p4, p2 andpi respectively) only 
if odd expectation values vanish and 

{SlSk){{^k)) = {SiSj)%j), (51) 

where the l.h.s. is the next nearest neighbour correlation, while the r.h.s. is the square 
of the nearest neighbour correlation. 

Leaving apart the trivial non-interacting case, the above condition is satisfied 
by an Ising model with nearest neighbour, next nearest neighbour and plaquette 
interactions, described by the Hamiltonian 

H = - Ji ^ SiSj - ^2 ^ SiSj - '/4 ^ SiSjSkSi, (52) 
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Figure 8. A 3 X 3 square on the square lattice 



if the couplings satisfy the disorder condition (see |68) and refs. therein) 

g4J2+2J4 ^ g-4J2+2J4 ^ 2e~^-^^ 
^osH2Ji) = 2(e2J2+e2J4) ' ^^^^ 

This defines a variety in the parameter space, lying in the disordered phase of the 
model, and in particular in the region where nearest neighbour and next nearest 
neighbour interactions compete. In this case the square approximation of the CVM 
yields the exact solution, including the exact free energy density 

/ = - In [cxp(- J4) + exp( J4 - 2 J2)] , (54) 
and the nearest neighbour correlation 

exp(-4J2) - cosh(2 Ji) 
' = = ^M^) ■ ^''^ 

Higher order correlations can be derived from the joint probability Equation H49() . for 
example the two-body correlation function r(a;, y) = {s{xo,yQ)s{xQ + x,yo+y)) (where 
spin variables have been identified by their coordinates on the lattice), which simply 
reduces to a power of the nearest neighbour correlation: r{x,y) = For this 

reason a line of disorder points is often referred to as a one-dimensional line. Or the 
plaquette correlation: 

(l-e8-'2)+4e2-^2(e2 

Finally, since all the pair correlations are given simply as powers of the nearest- 
neighbour correlation we can easily calculate the momentum space correlation 

function, or structure factor. We first write r(a;, y) — exp I , where 



q = {s^S,SkSl)a = ,4J.n _.8.7,^^..2,7W.2J,^.2J,^ ' (^6) 



^ = — (Ing) ^. After a Fourier transform one finds S{px,Py) — Si{px)Si{py), where 

sinh(l/^) 

Slip) = — r7rrF\ ■ 

cosh(l/4) — cosp 

It can be verified that the structure factors calculated by Sanchez TTT and (except for 
a misprint) Cirillo and coworkers |71| reduce to the above expression on the disorder 
line. 
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Figure 9. A typical configuration of the Mufioz— Eaton model. An empty (resp. 
filled) circle at row i and column j represents the variable Xij taking value 
(resp. 1). 



5.4- Wako-Saitd-MuHoz-Eaton model of protein folding 

There is at least another case in which the probabihty factors at the level of square 
plaquettes, and the CVM yields the exact solution. It is the Wako-Saito-Muhoz- 
Eaton model of protein folding |721|7ni|71|[71[7ni[771[7Sl[7H|. Here we wiU not delve 
into the details of the model, giving only its Hamiltonian in the form 

L L j 
H = ^^hi^jXij, Xij^Y\_Xk, Xk=0,l. (58) 

i—l j—i k—i 

It is a one-dimensional model with arbitrary range multivariable interactions, but the 
particular form of these interactions makes an exact solution possible. A crucial step 
in this solution was the mapping to a two-dimensional model |77| , where the statistical 
variables are the Xij^s (see Figure]^ for an illustration). In terms of these variables 
the Hamiltonian is local, and the only price one has to pay is to take into account the 
constraints 

which can be viewed as local interactions. 

In order to derive the factorization of the probability |73, we need first to exploit 
the locality of interactions, which allows us to write 

,(1.2)„(2,3) . . .„(L-1,L) 
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where p^-'-' denotes the probabiUty of the jth row in Figure |51 and p'-i'i+^'l denotes the 
joint probabihty of rows j and j + 1. 

As a second step, consider the effect of the constraints. This is best understood 
looking at the fohowing example: 

_P^k0,0)---pi;^i(0,l)...p5!^i^^(l,l) 



p^'^(o)---pF(o)pii\(i)---p?2i(i) 



-(61) 



The CVM-hke factorization is possible since every factor in the numerator, except 
Pi"2fi(0, 1), cancels with a factor in the denominator. A similar result can be obtained 
for the joint probability of two adjacent rows, and substituting into H6U|) one eventually 
gets 

P{{x^A)^llp»iXc.r'', (62) 

where R = {square plaquettes, corners (on the diagonal), and their subclusters} and 
Qa is the CVM Mobius number for cluster a. 



6. Cluster Variation Method as an approximation 

In most applications the CVM does not yield exact results, and hence it is worth 
investigating its properties as an approximation. 

An important issue is the choice of maximal clusters, and in particular the 
existence of sequence of approximations (that is, sequence of choices of maximal 
clusters) with some property of convergence to the exact results. This has been 
long studied in the literature regarding applications to lattice, translation invariant, 
systems and will be the subject of subsection 16.11 In particular, rigorous results 
concerning sequences of approximations which converge to the exact solution have 
been derived by Schlijper EHl IHOji providing a sound theoretical basis for the 
earlier investigations by Kikuchi and Brush |81j . 

Another important issue is related to the practical determination of the minima of 
the CVM variational free energy. In the variational formulation of statistical mechanics 
the free energy is convex, but this property here is lost due to the presence of negative 
tta coefficients in the entropy expansion. More precisely, it has been shown [221 that 
the CVM variational free energy is convex if 

yS QR ^ fla > Rs^{ae R\3f3 Ca,f3e S}. (63) 

qG-Rs 

Similar conditions have been obtained by McEliece and Yildirim |HSI and Heskes, 
Albers and Kappen |84j . 

If this is not the case multiple minima can appear, and their determination can 
be nontrivial. Several algorithms have been developed to deal with this problem, 
falling mainly in two classes: message-passing algorithms, which will be discussed in 
subsection l6.2l and variational, provably convergent algorithms, which will be discussed 
in subsection 16.31 
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6.1. Asymptotic behaviour 

The first studies on the asymptotic behaviour of sequences of CVM approximations 
are due to Schhjper [SSI EU] . He showed that it is possible to build sequences of 
CVM approximations (that is, sequences of sets of maximal clusters) such that the 
corresponding sequence of free energies converge to the exact one, for a translation- 
invariant model in the thermodynamic limit. The most interesting result, related 
to the transfer matrix idea, is that for a d-dimensional model the maximal clusters 
considered have to increase in d — 1 dimensions only. 

These results provided a theoretical justification for the series of approximations 
developed by Kikuchi and Brush |81| , who introduced the B2L series of approximations 
for translation-invariant models on the two-dimensional square lattice, based on 
zig-zag maximal clusters, as shown in Figure ^1 and applied it to the zero field 
Ising model. Based solely on the results from the B2 (which is equivalent to the 
plaquette approximation) and B4 approximations, Kikuchi and Brush postulated a 
linear behaviour for the estimate of the critical temperature as a function of (2L + 1)^^. 
1 3 ... 2L + 1 




2 ■ ■ ■ 2L 

Figure 10. Maximal cluster for the B2L approximation 



In Figure 1111 we have reported the inverse critical temperature as a function of 
(2L+1)~^ for L = 1 to 6. The extrapolated inverse critical temperature is /3c — 0.4397, 
to be compared with the exactly known (3c = ^ ln(l + v^) — 0.4407. 

It is not our purpose here to make a complete finite size scaling analysis, in the 
spirit of the coherent anomaly method (see below), of the CVM approximation series. 
We limit ourselves to show the finite size behaviour of the critical magnetization. 
More precisely, we have computed the magnetization of the zero field Ising model on 
the square lattice at the exactly known inverse critical temperature, again for L — I 
to 6. Figure [T^ shows that the critical magnetization vanishes as (2L + 1)^^'^, and 
the fit gives a very good estimate for the exponent, consistent with the exact result 
P/iy = 1/8. 

As a further illustration of the asymptotic properties of the B2L series we 
report in Figure ^1 the zero temperature entropy (actually the difference between 
the extrapolated entropy density and the B2L estimate) of the Ising triangular 
antiferromagnet as a function of 1/L [SZj. It is clearly seen that asymptotically 
sl = sq ~ aL^^, and the fit yields the numerical results so ~ 0.323126 (the exact 
value being s « 0.323066) and ip « 1.7512 (remarkably close to 7/4). 

An attempt to extract non-classical critical behaviour from high precision low and 
high temperature results from CVM was made by the present author ^| 1201 121 1221 j 
using Fade and Adler approximants. This work has led to the development of an 18 
(3x3x2) site cluster approximation for the simple cubic lattice Ising model |22j . 
which is probably the largest cluster ever considered. The results obtained for the 
Ising model are still compatible with the most recent estimates [SHIj although of lower 
precision. 

It has also been considered the possibility of extracting non-classical critical 
behaviour from CVM results by means of the coherent anomaly method, which applies 
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Figure 11. Inverse critical temperature of the B2L approximation series 
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Figure 12. Critical temperature of the B2L approximation series 
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finite size scaling ideas to series of mean-field-like approximations. A review of these 
results can be found in |23| . 



6.2. Message-passing algorithms 

In order to describe this class of algorithms it is useful to start with the Bethe- 
Peierls approximation (pair approximation of the CVM) free energy for the Ising 
model Equation 

= - ^ /li ^ SiPi{Si) - ^ Jij ^ S^SjPtj{Si, Sj) + 



I Si 



+ ^ ^ Ptj{si,Sj)lnpij{s,,Sj) - '^{di - l)'^pi{s^)\np^{si) 

(ij) Si.Sj i Si 

i \ Si / (ij) \s,,Sj J 



E 



(64) 
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One can easily recognize the energy terms, the pair entropy, the site entropy (with 
a Mobius number — (d, — 1), where di is the degree of node i), and the Lagrange 
terms corresponding to the normahzation and pair-site compatibiUty constraints. 
Observe that, due the presence of normahzation constraints, it is enough to impose 
the consistency between the spin expectation values given by the site and pair 
probabilities. 

A physical way of deriving message-passing algorithms for the determination 
of the stationary points of the above free energy is through the introduction of the 
effective field representation. Consider the interaction Jik and assume that, whenever 
this is not taken into account exactly, its effect on variable Si can be replaced by 
an effective field hi^k- This can be made rigorous by observing that the stationarity 
conditions 





= 



dpij{si,Sj) 

can be solved by writing the probabilities as 



(65) 



PiiSi) 



exp 



Pij 



exp 



(66) 
(67) 



( k^j \ I k^i \ 

hi + ^ hi^k \ Si + \ hj + ^ hj^k Sj + JijSiSj 

V /cNNi / y feNNj J 

where the effective fields, and the site and pair free energies Fi and Fij , are related to 
the Lagrange multipliers through 

A, ^ (d, -!)(! + F,) 

Xij = - 1 - Fij 



k^j 

J2 ^^^k- (68) 

Fi and Fij are determined by the normalization, but first of all the effective fields must 
be computed by imposing the corresponding compatibility constraints, which can be 
cast into the form 



ft-ij — tanh 



tanh 




tanhj,-. 



(69) 



This is a set of coupled nonlinear equations which is often solved by iteration, 
that is an initial guess is made for the /i^j 's, plugged into the r.h.s. of Equation 
which returns a new estimate, and the procedure is then repeated until a fixed point 
is (hopefully) reached. The values of the effective fields at the fixed point can then be 
used to compute the probabilities according to Equation (|67|l . 

The above equations, and their generalizations at the CVM level, have been 
intensively used in the 80's for studying the average behaviour of models with quenched 
random interactions, like Ising spin glass models. This work was started by a paper by 
Morita [SHI, where an integral equation for the probability distribution of the effective 
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field, given the probability distributions of the interactions and fields, was derived. In 
the general case this integral equation takes the form 



Phjif^hj) = / ^ I - tanh ^ 




tanh I /i, + > /ii./c I tanhJij 



X Pj{hj)dhjPij{Jij)dJij Y[ Pj,k{^j,k)dhj,k, (70) 

fcNN j 

with simplifications occurring if the probability distributions can be assumed to be 
site-independent, which is the most studied case. Typical calculations concerned: the 
determination of elements of the phase diagrams of Ising spin glass models, through 
the calculation of the instability loci of the paramagnetic solution; results in the 
zero temperature limit, where solutions with a discrete support are found; iterative 
numerical solutions of the integral equation. A review of this line of research until 
1986 can be found in 87 . It is important to notice that the results obtained by this 
approach are equivalent to those by the replica method, at the replica symmetric level. 

The effective field approach is reminiscent of the message-passing procedure 
at the heart of the belief propagation (BP) algorithm, and indeed the messages 
appearing in this algorithm are related, in the Ising case, to the effective fields by 
m(^ij-^^i{si) = exp(/iijSi), where m(ij-^^i{si) denotes a message going from the NN 
pair {ij) to node i. 

In order to derive the BP algorithm consider the Bethe-Peierls approximation for 
a model with variable nodes i and factor nodes a. The variables Si need not be limited 
to two states and the Hamiltonian is written in the general form Equation ©. 

The CVM free energy, with the normalization and compatibility constraints, can 
then be written as 

^ = - X! X! Ha{Sa)Pa{Sa) + 
a s„ 

+ ^^PQ(Sa)lnpa(Sa) - ^(rfj - l)'^Pi{s.i)hipi[si) + 
a Sfj, i Si 

+ (Ep»(^^) - 1 ) + E^« (EEp'^(«") - 1 ) + 

i \ Si / a \ a So, / 

+ EEE'^«.'(^') - E^'-(*-) ' (^1) 

where Sa,\i denotes the set of variables entering factor node a, except i. 
The stationarity conditions 

OPa[Sa) 

can then be solved, in analogy with Equation H67|l. by 

Pi{Si) = ]^ma^i(Si) 

2Ga 
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1 



Pa{Sa) = -^■>Pa{Sa)Y[Yi'^b^k{sk)- (73) 

" kea keb 

In the particular case of an Ising model with pairwise interactions, the previously 
mentioned relationship between messages and effective fields is evident from the above 
equation. 

Now Zi and Za take care of normalization, and the messages ma^i{si) are related 
to the Lagrange multipliers by 

i^a.k{sk) = ^lnm(,^fc(sfc). (74) 
fceb 

Notice that the messages can be regarded as exponentials of a new set of Lagrange 
multipliers, with the constraints rewritten as in the following free energy 

a So. 

+ '^'^PaiSa) \npa{8a) - '^{di - 1) ^Pi(si) \npi{si) + 
a So, i Si 

+ [J2p^^^^^-A +E^» (Y.Y.p-^^" 

i \ Si / a \ a So. 

a i£a Si \ i^b Sb\i / 

Again, imposing compatibility between variable nodes and factor nodes, one gets 
a set of coupled equations for the messages which, leaving apart normalization, read 

rUa^iiSi) CX ^ ijjaiSa) Y[ If ^b~,k{sk)- (76) 
Sa,\i k^a k(^b 

The above equations, and their iterative solution, are the core of the BP algorithm. 
Also, their structure justifies the name "Sum-Product" which is often given them 
in the literature on probabilistic graphical models, and the corresponding term "Max- 
Product" for their zero temperature limit. 

There are several issues which must be considered when discussing the property 
of an iterative algorithm based on Equation (|76|) . First of all, one could ask whether 
messages have to be updated sequentially or in parallel. This degree of freedom does 
not affect the fixed points of the algorithm, but it affects the dynamics. This issue has 
been considered in some depth by Kfir and Kanter in the context of the decoding 
of error-correcting codes. In that case they showed that the sequential update results 
in twice faster convergence with respect to the parallel update. 

Convergence is however not guaranteed if the underlying graph is not tree-like, 
that is if the pair approximation of the CVM is not exact. This issue has been 
investigated theoretically by Tatikonda and Jordan [SS], Mooij and Kappen PU]. 
Ihler et al [5T], who derived sufficient conditions for convergence, and by Heskes 
P2j . who derived sufficient conditions for the uniqueness of the fixed point. In 
practice it is typically observed that the BP algorithm converges if the frustration 
due to competitive interactions, like those characteristic of spin-glass or constraint 
satisfaction models, is not too large. In some cases, the trick of damping, or inertia. 
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can help extending the convergence domain. The trick consists in taking the updated 
message equal to a weighted (possibly geometrical) average of the old message and 
the new one given by Equation H7()|) . The convergence domain of the BP algorithm 
has been determined for several problems, like satisfiability j26| . graph colouring |27j . 
error correcting codes jin] and spin glasses Within its convergence domain, the 
BP algorithm is indeed very fast, and this is its real strength. See the next subsection 
for some performance tests and a comparison with provably convergent algorithms. 

Once a fixed point has been obtained it is worth asking whether this corresponds 
to a minimum of the free energy or not. This has been partially solved by Heskes }94| . 
who has shown that stable fixed points of the belief propagation algorithm are minima 
of the CVM pair approximation free energy, but the converse is not necessarily true. 
Actually, examples can be found of minima of the free energy which correspond to 
unstable fixed points of the belief propagation algorithm. 

An important advancement in this topic is the generalized belief propagation 
(GBP) algorithm by Yedidia and coworkers The fixed points of the GBP 

algorithm for a certain choice of clusters correspond to stationary points of the CVM 
free energy at the approximation level corresponding by the same choice of clusters or, 
more generally, of a region graph free energy. Actually, for a given choice of clusters, 
different GBP algorithms can be devised. Here only the so-called parent to child GBP 
algorithm [^Hl will be considered. Other choices are described in |56| . 

In order to better understand this algorithm, notice a few characteristics of the 
belief propagation algorithm. First of all, looking at the probabilities Equation (|73|l 
one can say that a variable node receives messages from all the factor nodes it belongs 
to, while a factor node a receives messages from all the other factor nodes to which its 
variable nodes i d a belong. In addition, the constraint corresponding to the message 
ma~ii{si) (see Equation (|75|l l can be written as 

'^Pa{Sa) = ^ ^Pb{Sb) - {di - l)p^{Si). (77) 

The parent to child GBP algorithm generalizes these characteristics in a rather 
straightforward way. First of all, messages TOq.^/3(s^) (/? C a) are introduced from 
regions (parent regions) to subregions (child regions). Then, the probability of a region 
takes into account messages coming from outer regions to itself and its subregions. 
Finally, exploiting the property Equation l|19|) of the Mobius numbers, the constraint 
corresponding to mQ,^^(s^) is written in the form 

It can be shown j5(i| that this new set of constraints is equivalent to the original one. 

To make this more rigorous, consider the free energy given by Equations 
l)2()|l and lj2U, with the above compatibility constraints (with Lagrange multipliers 
lnmQ,^/3(s^)) and the usual normalization constraints (with multipliers Aq). One 
obtains 



J2 ""-r 



= X] ^iY\Pi(^'f)^'fi^i) +^7(^7) 111^7(57)] + X! •^') 



JGR 



- ^lnm„_,0(S;3) 

ffCaeR S/3 



■yeR 



Y o-fYp-t^^^^ " 07^^7(5^) 



aCj^R 



(79) 
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where it is not necessary to put all the possible a — s- /3 compatibility constraints, but 
it is enough to put those which satisfy 7^ 0, 7^ and /5 is a direct subregion of 
a, that is there is no region 7 with ^ such that /3 C 7 C a. Notice also that the 
Lagrange term corresponding to the a ^ [3 constraint can be written as 

-Inma-^pisp) ^ a^^p^(s^). (80) 

The stationarity conditions 

j^^^, = (81) 
can then be solved, leaving apart normalization, by 

p^(s^) (X exp [-i?^(s^)] J]^ ma^l3{si3), (82) 

/3C7/3Caei?, 

where sp denotes the restriction of to subregion [3. 

Finally, message update rules can be derived again by the compatibility 
constraints, though some care is needed, since in the general case these constraints 
are not immediately solved with respect to the (updated) messages, as it occurs in 
the derivation of Equation (|76|l . Here one obtains a coupled set of equations in the 
updated messages, which can be solved starting from the constraints involving the 
smallest clusters. 

An example can be helpful here. Consider a model defined on a regular square 
lattice, with periodic boundary conditions, and the CVM square approximation, 
that is the approximation obtained by taking the elementary square plaquettes as 
maximal clusters. The entropy expansion contains only terms for square plaquettes 
(with Mobius numbers 1), NN pairs (Mobius numbers -1) and single nodes (Mobius 
numbers 1), as in Equation (|48|) . A minimal set of compatibility constraints includes 
node-pair and pair-square constraints, and one has therefore to deal with square-to- 
pair and pair-to-node messages, which will be denoted by ruij^kiisi, Sj) and mij{si) 
respectively. With reference to the portion of the lattice depicted in Figure the 
probabilities, according to Equation (|82|l . can be written as 

Pi{si) cx exp[-iJj(si)] m^^aisi) mi.j{si) rriijisi) m^^hisi), 

Pi.j{s^,Sj) cx e^p[-Hij{si, Sj)]m^^a{si) mij{si) m^^h{si) x 

X mj^i,{sj) ruj^cisj) mj^kisj) rriij^abisi, Sj) niijjkisi, Sj), 
Pijki{si, Sj,Sk, si) (X e^p[-Hijki{si,Sj, Sk, si)] mi^a{si) mi^h{si) mj^bisj) mj^dsj) x 

X mkA{sk) mk,e{sk) mi,f{si) mi^g{si) x 

X mtj^abisi,Sj) mjk,cd{sj, Sk) mki,ef{sk, si) mij^ghisi, sj). (83) 
Imposing node-pair and pair-square constraints one gets equations like 
exp[-Hi{si)]mi^j{si) oc ^ exp[-i?y (s^, s^)] x 



X mj^b{sj) nij^dsj) mj_k{s.j) mij^ab{si, Sj) mijjk{si, Sj), 

E 



exp[-Hij{si, Sj)]mij{si) nij^kisj) rriij^ikisi, Sj) cx exp[~Hijki{si, sj, Sk, si)] x 



X mk,d{sk) mk,e{sk) mjisi) m.gisi) X 

X mjk,cdisj, Sk) mki,efisk, si) mij^ghisi, Sj). (84) 
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Figure 14. A portion of tlie square lattice 



The above equations can be viewed as a set of equations in the updated messages at 
iteration t + 1, appearing in the l.h.s., given the messages at iteration t, appearing in 
the r.h.s.. It is clear that one has first to calculate the updated pair-to-site messages 
according to the first equation, and then the updated square-to-pair messages 
according to the second one, using in the l.h.s. the updated pair-to-site messages 
just obtained. 

GBP (possibly with damping) typically exhibits better convergence properties 
(and greater accuracy) than BP, but the empirical rule that a sufficient amount 
of frustration can make it not convergent is valid also for GBP. It is therefore 
fundamental to look for provably convergent algorithms, which will be discussed in the 
next subsection. A variation of the BP algorithm, the conditioned probability (CP) 
algorithm, with improved convergence properties, has recently been introduced j95j . 
The extension of this algorithm beyond the BP level is however not straightforward. 

We conclude the present subsection by mentioning that techniques like the 
Thouless- Anderson-Palmer equations, or the cavity method, both widely used in 
the statistical physics of spin glasses, are strictly related to the Bethe-Peierls 
approximation. 

The Thouless-Anderson-Palmer |^ equations can be derived from the Bethe- 
Peierls free energy for the Ising model, through the so-called Plefka expansion |97j . 
One has first to write the free energy as a function of magnetizations and nearest- 
neighbour correlations through the parameterization 

P^{si)^ ^ Pij{Si,Sj) = ^ (85) 

then to solve analytically the stationarity conditions with respect to the c^'s and 
finally to expand to second order in the inverse temperature. 

Finally, the cavity method |98[ 1991 [11711] is particularly important since it allows to 
deal with replica symmetry breaking. The cavity method, though historically derived 
in a different way, can be regarded as an alternative choice of messages and effective 
fields in the Bethe-Peierls approximation. With reference to Equation l(7!^. introduce 
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messages mk^a{sk) from variable nodes to factor nodes according to 

mk^a{sk) = W_-mi,^k{sk). (86) 
feeb 

Then the probabiHties Equation \TA\ become 
Pi{si) = -^Wrna^i{si) 

PaiSa) = -^IpaiSa) mfc^a(sfc), (87) 

and the message update equations (|76|l become 

ma-,i{Si) CX ^ i->a{Sa.) W mk^a{sk)- (88) 
Sa,\i k^a 

The effective fields corresponding to the factor-to-variable messages ma^i{si) are 
usually called cavity biases, while those corresponding to the variable-to-factor 
messages mi^a{si) are called cavity fields. In the Ising example above a factor node 
is just a pair of NNs and cavity biases reduce to effective fields /i^j , while cavity fields 

take the form /ii,fc- 

feNNi 

The cavity method admits an extension to cases where one step of replica 
symmetry breaking occurs 100, 101 . In such a case one assumes that there exist many 
states characterized by different values of the cavity biases and fields, and introduces 
the probability distributions of cavity biases and fields over the states. From the above 
message update rules one can then derive integral equations, similar to Equation H70|l . 
for the distributions. These integral equations can in principle be solved by iterative 
population dynamics algorithms, but most often one restricts to the zero temperature 
case, where these distributions have a discrete support. 

The zero temperature case is particularly relevant for hard combinatorial 
optimization problems, where 1-step replica symmetry breaking corresponds to 
clustering of solutions. Clustering means that the space of solutions becomes 
disconnected, made of subspaces which cannot be reached from one another by means 
of local moves, and hence all local algorithms, like BP or GBP, are bound to fail. The 
cavity method has been used to solve these kind of problems in the framework of the 
survey propagation algorithm |25| . which has been shown to be a very powerful tool 
for constraint satisfaction problems like satisfiability (SHj and colouring |2II defined 
on finite connectivity random graphs. These graphs are locally tree-like and therefore 
all the analysis can be carried out at the Bethe-Peierls level. A sort of generalized 
survey propagation capable of dealing with short loops would really be welcome, but 
it seems that realizability issues are crucial here and replica symmetry breaking can 
only be introduced when CVM gives an exact solution. 

A different approach, still aimed to generalize the BP algorithm to situations 
where replica symmetry breaking occurs, has been suggested by van Mourik [2H) . and 
is based on the analysis of the time evolution of the BP algorithm. 
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6.3. Variational algorithms 

In the present subsection we discuss algorithms which update probabihties instead 
of messages. At every iteration a new estimate of probabihties, and hence of the 
free energy, is obtained. These algorithms are typically provably convergent, and the 
proof is based on showing that the free energy decreases at each iteration. This is 
of course not possible with BP and GBP algorithms, where the probabilities and the 
free energy can be evaluated only at the fixed point. The price one has to pay is 
that in variational algorithms one has to solve the compatibility constraints at every 
iteration, and therefore these are double loop algorithms, where the outer loop is used 
to update probabilities and the inner loop is used to solve the constraints. 

The natural iteration method (NIM) ^102.. 103) is the oldest algorithm specifically 
designed to minimize the CVM variational free energy. It was originally introduced 
| 1U2| in the context of homogeneous models, for the pair and tetrahedron (for the fee 
lattice) approximations. In such cases the compatibility constraints are trivial. Later 
| 103| it was generalized to cases where the compatibility constraints cannot be solved 
trivially. An improved version of the algorithm, with tunable convergence properties, 
appeared in |ilfl4, and its application is described in some detail also in |1()5| . where 
higher order approximations are considered. 

The algorithm is based on a double loop scheme, where the inner loop is used to 
solve the compatibility constraints, so that at each iteration of the outer loop a set of 
cluster probabilities which satisfy the constraints is obtained. 

Proof of convergence, based on showing that the free energy decreases at every 
outer loop iteration, exist in many cases, but it has also been shown that there are 
non-convergent cases, like the four-dimensional Ising model |lU6j in the hypercube 
approximation. 

We do not discuss in detail this algorithm since it is rather slow, and better 
alternatives have been recently developed. 

A first step in this direction was the concave-convex procedure (CCCP) by 
Yuille |1U7) . who started from the observation that the non-convergence problems 
of message-passing algorithms arise from concave terms in the variational free energy, 
that is from the entropy of clusters with negative Mobius numbers. His idea was then 
to split the CVM free energy into a convex and a concave part, 

T{{pc,}) = T.cA{Pa}) + ^cavo({Pa}), (89) 

and to write the update equations to be iterated to a fixed point as 

V.F_({P^^)}) = -V.F_e(U*^}), (90) 

where pi*^ and pa^^"^ are successive iterates. In order to solve the compatibility 
constraints, at each iteration of Equation (|90|l . the Lagrange multipliers enforcing the 
constraints are determined by another iterative algorithm where one solves for one 
multiplier at a time, and it can be shown that the free energy decreases at each outer 
loop iteration. Therefore we have another double loop algorithm, which is provably 
convergent, faster than NIM (as we shall see below), and allows some freedom in the 
splitting between convex and concave parts. 

A more general and elegant formalism, which will be described in the following, 
has however been put forward by Heskes, Albers and Kappen (HAK) ^4^. Their basic 
idea is to consider a sequence of convex variational free energies such that the sequence 
of the corresponding minima tends to the minimum of the CVM free energy. More 
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precisely, if the CVM free energy T{{pa,a G R}) is denoted for simplicity by J-{p), 
they consider a function ^conv(PjP')j convex in p, with the properties 

onv 

^co„v(p,p) =^(p). (91) 

The algorithm is then defined by the update rule for the probabilities 

= argmin^co„v(p,p'*'), (92) 
p 

and it is easily proved that the free energy decreases at each iteration and that a 
minimum of the CVM free energy is recovered at the fixed point. 

A lot of freedom is left in the definition of Tconv, a-nd strategies of varying 
complexity and speed can be obtained. NIM (when convergent) and CCCP can also 
be recovered as special cases. The general framework is based on the following three 
properties. 

(i) If /3 C a, then 

- Sa+ Sp = '^Pa{Sa) h\pa{Sa) - p p {s p) 111 p {s (93) 

is convex over the constraint set, i.e. it is a convex function of p^ and pp if these 
satisfy the compatibility constraint Equation ()23|l . 

(ii) The linear bound 

3/3 = -^p/3(s/3)lnp/3(s/3) < -^pp{sp)\npi^{sp) = S'p (94) 

Sfi Bfi 

holds, with equality only for = pp 

(iii) If 7 C and pp and p^ {p'^ and p'^) satisfy the compatibility constraints, the 
bound 

Sp- Sj = - ^pp{sp)\npp{sp) + ^p^{s^)lnp^{s^) < 

Sfi Sj 

< -^p/3(s0)lnp;3(s^)+^p^(s^)lnp;(s^) =.5;^-^; (95) 

Sfi s~f 

holds, and it is tighter than the previous bound. A tighter bound typically entail 
faster convergence. 

In order to give an example, consider again the CVM square approximation for a 
model on a regular square lattice with periodic boundary conditions and focus on the 
entropy part of the free energy, which according to the entropy expansion Equation 
H48|l has the form 

This contains both convex (from square and site entropy) and concave terms (from 
pair entropy). Notice that the numbers of plaquettes is the same as the number of 
sites, while there are two pairs (e.g. horizontal and vertical) per site. This implies that 
the free energy is not convex over the constraint set. 
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Several bounding schemes are possible to define ^conv For instance, one can 
obtain a function which is just convex over the constraint set by applying property 
(iii) to the site terms and half the pair terms, with the result 

□ (ij) » □ (ij) (ij) 

In the following the HAK algorithm will always be used with this bounding scheme. 

The NIM can be obtained if, starting from the above expression, one applies 
property (ii) to the not yet bounded pair terms, with the result 

- E + E - E ^ - E + E - E ^^ ( 

This is clearly a looser bound than the previous one, and hence it leads to a (much) 
slower algorithm. In the general case, the NIM (which of course was formulated 
in a different way) can be obtained by bounding all entropy terms except those 
corresponding to the maximal clusters. This choice does not always lead to a convex 
bound (though in most practically relevant cases this happens) and hence convergence 
is not always guaranteed. 

The CCCP recipe corresponds to bounding every convex {ap < 0) term by 

- apSp < -Sfj + (1 - ap)S'f^, (99) 

using property (ii). In the present case this gives 

- E + E - E ^ - E - E + 2 E - E ( i 

which is convex independently of the constraints, and hence the bound is again looser 
than Equation (PTjl 

In all cases one is left with a double loop algorithm, the outer loop being defined 
by the update rule for probabilities, and the inner loop being used for the minimization 
involved in Equation (|92|l . This minimization is simpler than the original problem, 
since the function to be minimized is convex. In each of the above schemes a particular 
technique was proposed for the convex minimization in the inner loop, and here these 
will not be covered in detail. 

A point which is important to notice here is that the bounding operation gives 
a new free energy which is structurally different from a CVM free energy. It must 
be minimized with respect to p at fixed p' and, viewed as a function of p, it contains 
an entropy expansion with coefficients which do not satisfy anymore the Mobius 
relation 11911 (for instance, in the "just convex over the constraint set" scheme, we have 

= 1, aij = —1/2 and = 0). This means that a message-passing algorithm like 
parent-to-child GBP, which relies on the Mobius property, cannot be applied. In IS^ 
a different message-passing algorithm, which can still be viewed as a GBP algorithm, 
is suggested. 

Observe also that there are entropy-like terms S'^ which are actually linear in pp 
and must therefore be absorbed in the energy terms. 

The main reason for investigating these double loop, provably convergent 
algorithms, is the non-convergence of BP and GBP in frustrated cases. Since BP 
and GBP, when they converge, are the fastest algorithms for the determination of 
the minima of the CVM free energy, it is worth making some performance tests to 
evaluate the speed of the various algorithms. The CPU times reported below refer to 
an Intel Pentium 4 processor at 3.06 GHz, using g77 under GNU/Linux. 




Consider first a chain of N Ising spins, with ferromagnetic interactions J > and 
random bimodal fields hi independently drawn from the distribution 

p{hi} = ^S{h, - ho) + ^6{h, + ho). (101) 

The boundary conditions are open, and the model is exactly solved by the CVM 
pair approximation. The various algorithms described are run from a disordered, 
uncorrelated state and stopped when the distance between two successive iterations, 
defined as the sum of the squared variations of the messages (or the probabilities, or 
the Lagrange multipliers, depending on the algorithm and the loop - outer or inner 
- considered). Figure [T5l reports the CPU times obtained with several algorithms, 
for the case J = 0.1, ho = 1. The HAK algorithm is not reported since it reduces 
to BP due to the convexity of the free energy. It is seen that the CPU time grows 
linearly with N for all algorithms except NIM, in which case it goes like N^. Despite 
the common linear behaviour, there are order of magnitude differences between the 
various algorithms. While BP and CP converges in 4 and 9 seconds respectively for 
N = IQS, CCCP takes 15 seconds for N = 10^. For NIM, finally, the fixed point is 
reached in 12 seconds for N = 10^. 

As a further test, consider, again at the level of the pair approximation, the 
two-dimensional Edwards-Anderson spin glass model, defined by the Hamiltonian 
Equation (jSJ with hi = and random bimodal interactions independently drawn 
from the distribution 

p(Jy) = (1 -p)(5(J,, - J) +p(5(Jy + J). (102) 

Here the frustration effects are even more important and the non-convergence problem 
of BP becomes evident. As a rule of thumb, when the temperature, measured by J~^, 




Figure 16. CPU times (seconds) for the 2d Edwards-Anderson model in the 
paramagnetic phase 



is small enough and p (the fraction of antiferromagnetic bonds) is large enough, the BP 
algorithm stops converging. The condition for the instability of the BP fixed point 
has been computed, in the average case, for Ising spin glass models with pairwise 
interactions |93|. In order to compare algorithm performances, Figure [TCI reports 
CPU times vs L for N — lattices with periodic boundary conditions, J = 0.2 and 
p — 1/2, that is well into the paramagnetic phase of the model. The initial guess is a 
ferromagnetic state with rrii — 0.9, Vi. It is seen that the CPU times scale roughly as 
A^^-^ for all the algorithms considered except NIM, which goes like N^-^ . Again the 
algorithms with linear behaviour are separated by orders of magnitude. For L = 320 
BP converges in 6 seconds, HAK in 370 seconds and CCCP in 2460 seconds. 

CP has not been considered in the present and the following tests, although 
empirically it is seen that its behaviour is rather close to the HAK algorithm. Its 
performance is however severely limited as soon as one considers variable with more 
than two states, due to a sum over the configurations of the neighbourhood of a NN 
pair. 

A similar comparison can be made in the ferromagnetic phase, setting J — 0.5 
and p = 0.1. Here the CPU times for the BP algorithm exhibit large fluctuations for 
different realizations of the disorder, and the data reported are obtained by averaging 
over 30 such realizations. Now all algorithms exhibit comparable scaling properties, 
with CPU times growing like N^-^ ^ N^-'^ . As far as absolute values are concerned, 
for L ~ 50 convergence is reached in 4, 44, 680 and 1535 seconds by BP, HAK, CCCP 
and NIM respectively. 

A similar scaling analysis was not possible into the glassy phase (which is 
unphysically predicted by the pair approximation), due to non-convergence of BP 




and too large fluctuations of the convergence time of the other algorithms. 

As a general remark we observe that BP is the fastest algorithm available 
whenever it converges. Among the provably convergent algorithms, the fastest one 
turns out to be HAK, at least in the "just convex over the constraints" scheme 
which was used here. 



7. Conclusions 

Some aspects of the cluster variation method have been briefly reviewed. The emphasis 
was on recent developments, not yet covered by the 1994 special issue of Progress of 
Theoretical Supplement and the focus was on the methodological aspects rather 
than on the applications. 

The discussion has been based on what can be considered the modern formulation 
of the CVM, due to An "ST, based on a truncation of the cumulant expansion of the 
entropy in the variational principle of equilibrium statistical mechanics. 

The advancements in this last decade were often due to the interaction between 
two communities of researchers, working on statistical physics and, in a broad sense, 
probabilistic graphical models for inference and optimization problems. The interest 
of both communities is currently on heterogeneous problems, while in the past the 
CVM was most often applied to translation invariant lattice models (in this topic, 
the only new advancements discussed have been the attempts to extract information 
about critical behaviour from CVM results). The more general point of view that has 
to be adopted in studying heterogeneous problems has been crucial to achieve many 
of the results discussed. 
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The formal properties of the CVM have been better understood by comparing 
it with other region-based approximations, Uke the junction graph method or the 
most general formulation of the Bethe-Peierls approximation (the lowest order of the 
CVM), which can treat also non-pairwise interactions. Studying realizability, that is 
the possibility of reconstructing a global probability distribution from the marginals 
predicted by the CVM, has led to the discovery of non-tree-like models for which the 
CVM gives the exact solution. 

A very important step was made by understanding that belief propagation, a 
message-passing algorithm widely used in the literature on probabilistic graphical 
models, has fixed points which correspond to stationary points of the Bethe-Peierls 
approximation. The belief propagation can thus be regarded as a powerful algorithm 
to solve the CVM variational problem, that is to find minima of the approximate 
free energy, at the Bethe-Peierls level. This opened the way to the formulation of 
generalized belief propagation algorithms, whose fixed points correspond to stationary 
points of the CVM free energy, at higher level of approximation. 

Belief propagation and generalized belief propagation are certainly the fastest 
available algorithms for the minimization of the CVM free energy, but they often 
fail to converge. Typically this happens when the problems under consideration are 
sufficiently frustrated. In order to overcome this difficulty double loop, provably 
convergent algorithms have been devised, for which the free energy can be shown 
to decrease at each iteration. These are similar in spirit to the old natural iteration 
method by Kikuchi, but orders of magnitude faster, though not as fast as BP and 
GBP. 

When the frustration due to competitive interactions or constraints is very strong, 
like in spin-glass models in the glassy phase or in constraints satisfaction problems in 
the hard regime, even double loop algorithms become useless, since we are faced with 
the problem of replica symmetry breaking, corresponding to clustering of solutions. 
Very important advancements have been made in recent years by extending the belief 
propagation algorithm into this domain. These results are in a sense at the border 
of the CVM, since they are at present confined to the lowest order of the CVM 
approximation, that is the Bethe-Peierls approximation. 

It will be of particular importance, in view of the applications to hard optimization 
problems with non-tree-like structure, to understand how to generalize these results 
to higher order approximations. 
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